Identifying Foreign Person Names in Chinese Text

نویسندگان

  • Stephan Busemann
  • Yajing Zhang
چکیده

Foreign name expressions written in Chinese characters are difficult to recognize since the sequence of characters represents the Chinese pronunciation of the name. This paper suggests that known English or German person names can reliably be identified on the basis of the similarity between the Chinese and the foreign pronunciation. In addition to locating a person name in the text and learning that it is foreign, the corresponding foreign name is identified, thus gaining precious additional information for cross-lingual applications. This idea is implemented as a statistical module into a rule-based named entity recognition system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic and Acoustic Analysis of Chinese Person Names.r..

In this paper, we give results on our recent study on Chinese person names. The analysis is based on a corpus of 1 million names. The results include the syllable lengths and surname composition of the names in the corpus, full name and given name statistic results and analysis, tonal pattern analysis of Chinese full and given names, and name confusion analysis when given number of names are ex...

متن کامل

An Interpretative Data Analysis of Chinese Named Entity Subtypes

"In assessing the performance of information extraction systems, we are interested in knowing the classes of errors made and the circumstances in which they are made."[!] However, to date the Tipster scoring categories (correct, partial, incorrect, spurious, missing, and noncommitta[) have not been applied to classes of data based on structural distinctions in the language, or on semantic subcl...

متن کامل

Extracting Names From Arabic Text for Question-Answering Systems

Tagging and extracting proper names is an important key for improving the effectiveness of questionanswering systems. The valuable information in the text usually is located around proper names, to collect this information it should be found first. By extracting proper names from the text we provide questionanswering systems with both the proper name found in the text, some information about it...

متن کامل

Rule-based Person Name Recognition for Xinjiang Minority Languages

Xinjiang multi-nationality name entity recognition is an important part in multi-language processing. In this paper, we analyze the patterns of Uighur and Kazak person names, and perform the name identity recognition using rule-based approach. We also propose and implement the rules for Uighur and Kazak word segmentation.

متن کامل

Training Multi-Classifiers for Chinese Unknown Word Detection

According to a survey in a corpus, majority of the unknown words in Chinese texts are numbers, time nouns and person names. Detection of numbers and time nouns are trivial tasks. These three types of unknown words may need different feature sets and parameters to achieve optimal results. For example, characters used for Chinese family names help in Chinese person name detection. Therefore, we p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008